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Objective 

This  project  has  two  complementary  goals:  to  develop  new  methods  for  video  tracking  using  multiple,  physically 
distributed  cameras  without  relying  on  centralized  servers;  and  to  develop  new  algorithms  for  tracking  using  infrared  images. 

Approach 

Distributed  smart  cameras :  We  must  devise  peer-to-peer  tracking  algorithms  that  exchange  information  directly 
between  cameras  and  processors.  Central  servers  simplify  programming  but  make  the  system  less  reliable.  At  the  heart  of  a 
distributed  smart  camera  is  a  protocol  that  allows  the  cameras  to  exchange  information  reliably. 

Infrared  tracking  models'.  We  must  develop  new  target  models  that  allow  the  tracker  to  identify  targets  and  to 
distinguish  the  identity  of  targets  in  a  multi-target  scene.  These  models  must  be  robust  in  the  sense  that  they  are  relatively 
invariant  to  changes  in  the  orientation  of  the  target  relative  to  the  cameras.  They  must  also  be  computationally  inexpensive.  We 
are  interested  in  fusing  infrared  and  visible  images  from  physically  separated  cameras. 

Scientific  Barriers 

Distributed  tracking  using  infrared  cameras  requires  advances  in  two  distinct  areas:  distributed  computing  and  image 
processing/computer  vision. 

We  must  develop  target  models  for  infrared  that  are  robust  to  changes  in  target  orientation,  illumination,  and 
environmental  conditions.  In  general,  the  best  results  are  obtained  by  fusing  data:  fusion  of  multiple  channels,  such  as  visible 
and  infrared;  and  fusion  of  estimates  from  the  physically  distributed  nodes  in  the  system.  We  are  particularly  interested  in 
physically  separated  cameras— practical  systems  will  not  always  have  a  visible  camera  and  an  infrared  camera  paired  at  every 
node. 

We  must  develop  peer-to-peer  protocols  that  allow  reliable  operation  of  a  distributed  set  of  cameras  and  processors. 
Protocols  are  at  the  heart  of  distributed  computing  systems.  These  protocols  must  operate  in  real  time  to  avoid  losing  tracking 
data. 


Significance 

Infrared  tracking  is  an  important  modality  in  harsh  environments.  The  combination  of  infrared  and  visible  imagery  is 
especially  powerful.  Infrared  imaging  encompasses  a  broad  range  of  infrared  frequencies  that  have  different  characteristics,  hut 
infrared  imagers  can  see  through  some  kinds  of  haze  and  can  identify  people  and  vehicles  in  poor  lighting  conditions.  We 
believe  that  physically  distributed  visible  and  infrared  cameras  are  particularly  important  to  Army  applications.  Since  visible 
cameras  are  cheaper,  they  will  be  more  plentiful.  Infrared  cameras  may  not  alwrays  be  co-located  with  visible  cameras.  Even  if 
they  are  co-located,  it  is  difficult  to  accurately  align  the  two  cameras  since  they  use  different  optics.  We  need  to  develop 
algorithms  that  can  tolerate  infrared/visible  misalignment. 

Distributed  smart  cameras  offer  both  improved  tracking  performance  and  fault  tolerance.  When  a  target  is  tracked  by  a 
single  camera,  the  target  may  he  occluded  by  an  obstacle,  forcing  the  system  to  lose  track  of  the  target.  When  a  scene  is 
covered  by  several  physically  separated  cameras,  the  subject  is  much  more  likely  to  be  visible  to  at  least  one  camera  in  the 
scene.  Using  distributed  computers  to  process  the  imagery'  makes  the  system  fault  tolerant-— the  network  can  continue  to 
function  when  individual  nodes  are  lost.  Distributed  tracking,  when  properly  designed,  also  provides  lower  latency,  meaning 
that  targets  are  identified  and  tracked  more  quickly.  We  argue  that  traditional  multi-camera,  server-based  systems  are 
inherently  undeployable  in  the  field.  Server-based  architectures  require  raw  video  to  be  sent  to  the  central  server.  Even  if 
wireless  networks  allow  the  video  to  be  transferred,  the  netw  ork  requires  a  large  amount  of  power  to  send  the  large  volumes  of 
data.  This  large  data  transfer  is  also  vulnerable  to  interceptionjamming,  and  other  security  problems. 

Accomplishments 

We  have  concentrated  on  new  results  in  infrared/visible  image  fusion  and  analysis.  We  have  been  greatly  helped  in  these 
results  thanks  to  the  equipment  from  our  DURIP  aw  ard. 

We  have  achieved  several  significant  goals  over  the  past  year  relating  to  both  infrared  image  analysis  and  peer-to-peer 
architectures: 

•  We  developed  new  object  models  for  multi-spectral  imaging,  including  infrared  and  visible  light  cameras.  These 
models  do  not  require  cameras  to  be  optically  aligned. 

•  We  developed  a  true  peer-to-peer,  fault-tolerant  tracking  system. 

•  We  developed  new  methods  for  multi-camera  calibration  and  gesture  recognition  from  multi-spectral  camera  setups, 
including  infrared  and  visible  cameras. 

Result:  We  developed  new  object  models  for  multi-spectral  imaging.  These  models  are  useful  in  both  single-camera  and 
multi-camera  systems.  We  use  both  infrared  and  visible  imagery  to  separate  background  from  foreground.  These  models  use 
confidence  factors  to  determine  the  relative  importance  of  infrared  and  visible  data  in  each  region  of  the  image.  For  instance,  if 
both  channels  agree  on  whether  a  given  region  is  foreground  or  background,  that  assessment  is  given  higher  confidence. 


Visible  only  New  IR/visible  model 


Figure  1 :  A  comparison  of  target  splitting  using  visible  only  vs.  our  combined  infrared/visible  target  model. 

We  also  use  an  object  model  to  help  distinguish  objects  and  resolve  occlusion  problems.  In  the  visible  channel,  we 
estimate  the  direction  of  illumination  from  shadows  to  help  determine  the  true  extent  of  an  object.  In  the  infrared  channel,  we 
assume  that  the  target’s  heat  signature  does  not  change  significantly  during  the  video  sequence.  We  use  confidence  measures 
to  combine  the  information  from  the  channels.  Figure  1  shows  results  of  this  model,  which  results  in  improved  separation  of 
targets. 

Impact:  These  new  models  provide  more  accurate  tracking  of  human  subjects  from  single  and  multiple  node  camera 
systems. 

Result:  To  provide  fault-tolerant  distributed  camera  networks,  we  developed  SCCS,  a  peer-to-peer  tracking  system. 
ITiis  system  uses  a  protocol  to  communicate  between  camera  nodes.  Hie  system  does  not  rely  on  a  central  server.  It  keeps 
redundant  data  at  the  various  nodes  so  that  cameras  can  estimate  the  position  of  objects  even  when  a  node  fails.  Hie  nodes  do 
not  trade  raw  video  data— they  only  communicate  object  models  consisting  of  appearance  and  position  information.  When  an 
object  is  occluded  from  the  view  of  one  camera,  that  node  communicates  w  ith  other  nodes  to  determine  the  position  of  the 
occluded  object.  We  have  demonstrated  this  system  on  a  three-node  setup.  Each  node  has  a  visible  camera  and  its  own 
processor,  with  the  nodes  connected  by  a  standard  netw  ork. 

To  support  this  peer-to-peer  architecture,  we  developed  a  novel  algorithm  for  synchronizing  cameras  using  image 
processing.  The  cameras  must  be  synchronized  so  that  equivalent  frames  are  compared.  Our  algorithm  uses  tracking-based 
search  to  find  the  best  match  betw  een  video  frames. 

Impact:  We  believe  that  this  is  the  first  fault-tolerant  tracking  system.  The  cameras  share  no  central  computation  or 
communication  resource.  Information  is  stored  redundantly  around  the  network  so  that  nodes  can  continue  to  operate  when 
other  nodes  drop  out. 


Result:  We  created  a  new  set  of  visible/infrared  benchmark  videos  for  trackmg.  These  benchmarks  were  made  with  two 
pods  of  cameras  that  were  located  about  50  feet  apart.  The  visible  sequences  were  shot  in  high-definition  (10S0P)  video.  The 
infrared  sequences  were  shot  with  Flir  thermal  cameras. 

Impact:  These  benchmarks  have  allowed  us  to  study  more  closely  the  relationship  between  visible  and  infrared  images. 
We  hope  to  release  these  benchmarks  on  the  World  Wide  Web. 

Result:  We  generalized  our  multi-modal  data  fusion  model  to  handle  non-co-located  cameras. 


Infrared  input  image  Visible  input  image 


Transformed  infrared  image 

Figure  1.  Transforming  the  coordinate  space  of  an  infrared  image 


Figure  2:  Tracking  results. 


Visible  and  infrared  cameras  may  not  always  come  in  pairs— -we  may  have  infrared  cameras  placed  separately  from  the 
visible  cameras.  Our  previous  visible/infrared  fusion  method  required  that  the  cameras  be  co-located  so  that  pixels  could  be 
matched  in  the  two  images.  We  developed  geometric  transformation  algorithms  to  translate  the  infrared  image  into  the  visible 
camera  coordinate  system.  These  transformations  require  3-D  information  to  be  able  to  adjust  for  the  height  of  the  target. 
Figure  1  shows  the  original  visible  and  infrared  frames,  which  were  taken  from  cameras  that  were  about  50  feet  apart.  The 
bottom  infrared  image  was  transformed  into  the  visible  image’s  coordinate  system:  the  ground  plane  and  the  target  required 
separate  transformations. 

We  then  generalized  our  multi-modal  data  fusion  algorithm  to  handle  the  transformed  infrared  image.  The  results  were 
somewhat  disappointing— performance  did  not  improve  when  the  transformed  image  was  fused  with  the  visible  image,  when 
compared  to  the  visible  image  only.  This  is  because  the  pixels  must  be  compared  to  identify  regions  in  the  image  and  it  is 
difficult  to  transform  coordinate  spaces  accurately  enough  to  properly  register  the  two  sets  of  pixels.  This  is  particularly  true 
since  the  target’s  transformations  change  as  it  moves. 

Based  on  this  experience,  we  plan  to  concentrate  on  object-level  fusion  algorithms  over  the  next  reporting  period. 
These  algorithms  will  generate  regions  from  the  individual  images,  then  fuse  them  at  the  object  level  using  approximate 
geometric  relationships  between  the  two  cameras. 

Impact:  First  demonstration  that  we  know  of  fusing  data  from  w  idely  separated  visible/infrared  cameras.  We  will  use 
this  experience  to  develop  new'  object-based  visible/infrared  fusion  methods. 

Result:  We  measured  the  communication  performance  of  MPI.  MPI  is  a  middleware  system  for  grid  computing, 
designed  originally  to  support  large  scientific  computations.  We  used  it  as  the  communication  layer  for  SCCS.  MPI  offered 
sufficient  performance  for  a  three-node  system,  but  we  wanted  to  learn  whether  MPI  would  be  fast  enough  for  a  large  netw  ork 
of  cameras. 


MPI  Performance 


Time  in  Seconds 

Figure  3:  Commuication  speed  of  MPI  as  a  function  of  number  of  processing  nodes. 

Figure  3  shows  the  results  of  our  performance  measurement.  Communication  time  goes  past  one  second  for  a  network 
of  25  nodes.  Since  camera  networks  of  this  size  are  quite  possible  in  practical  applications,  we  believe  that  MPI  will  not  scale 
to  practical  smart  camera  networks.  We  believe  that  a  new  middleware  architecture  that  emphasizes  real-time  communication 
is  necessary  to  build  deployable  smart  camera  systems. 

Impact:  These  results  justify  the  design  of  a  new  middleware  communication  mechanism  that  is  better  suited  to  peer- 
to-peer,  real-time  communications. 

Result:  We  developed  a  new  hardware  architecture  for  real-time,  embedded  background  elimination.  This  system 
implements  mixture-of-Gaussian  background  elimination.  It  runs  at  30  frames/sec  on  an  FPGA  platform. 

Based  on  our  experience  with  the  FPGA  implementation,  we  plan  to  experiment  with  the  TI  DaVinci  processor  as  a 
platfonn. 

Impact:  This  architecture  could  lead  to  new  embedded  systems  for  computer  vision. 

Result:  We  developed  new  methods  for  fusing  data  from  unaligned  visible  and  infrared  cameras.  This  technique  uses  a 
combination  of  two  techniques,  homography  and  pseudo-3D  transfonnations,  to  transform  the  images  into  a  common  global 
image  space.  Homography  is  used  first  to  transfonn  the  infrared  image  to  align  its  ground  plane  with  that  of  the  visible  image. 


A  simplified  3-D  model  is  then  used  to  erect  the  target  off  the  ground  plane  so  that  it  more  accurately  corresponds  to  the 
target’s  position  in  the  visible  image. 

visible  infrared 


infrared  after  homography  infrared  after  homography 

and  pseudo  3D 

Figure  4:  Alignment  of  images  from  separated  visible  and  infrared  cameras. 

Figure  4  shows  the  results  of  a  test  of  the  algorithm  on  images  from  physically  separated  cameras.  Our  initial  results 
show  a  slight  improvement  in  precision  over  traditional  fusion.  We  continue  to  work  on  these  results. 

Impact:  This  technique  allows  us  to  make  use  of  infrared  cameras  that  are  not  paired  with  visible  cameras.  Optically 
aligning  visible  and  infrared  cameras  is  bulky  and  expensive.  Unaligned  camera  methods  will  allow  us  to  deploy  infrared 
cameras  in  more  realistic  environments. 


Result:  We  developed  a  multilevel  Bayesian  network  model  for  multi-band  image  fusion.  Each  band  has  its  own  image 
attributes — size,  aspect  ratio,  color,  brightness,  etc. — which  can  then  be  fused  by  additional  levels  of  Bayesian  models.  Our 
model,  shown  in  Figure  5,  is  based  on  the  multilevel  Bayesian  model  of  Singha. 


Figure  5:  Bayesian  network  model  for  multi -band  image  fusion. 


In  Figure  5,  the  first  level  is  the  object  level,  the  detected  blobs  are  classified  if  they  are  objects.  The  second  level  is  the  camera 
level  including  an  infrared  camera  and  some  visible  cameras,  we  collect  the  probabilities  from  all  cameras  in  this  level.  The 
third  level  is  the  features  level,  we  calculate  the  probabilities  of  all  features  of  the  blobs  in  each  camera.  We  expand  Formula  4 
as  following: 

P(Object  |  A  4)  =  wm[P(Object  \  IR .  %)P(IR  \  £>,£)  +  P(Object  \  1R .  4)P(7r  \  D,%)\ 

+  yv^P (Object  \  VS^)P(VS,  \D,%)+  P(Object  \  VSU  $)P(yS,  \  D,  £)]+  ...,  (5) 


P(IR  |  £>,£)  =  wSlze [P(IR  |  Size,  g)P(Size  |  D,%)  +  P(IR  |  Size,  %)P(Size  \D,Z)\ 

+  war[P(IR  |  AR,  %)P(AR  |  £>,£)  +  P(IR  \  AR,  %)P(AR  \  D, £)] 

+  wB  [P(IR  |  B,£)P(B  |  D,  g)  +  P(IR\  B,  %)P(B  \D,%)],  (6) 

P(VS \D,Z)  =  wSze  [P(VS  |  Size,  g)P(Size  \D,%)  +  P(VS  |  SiTe,  Z)P(SiTe  \  D,  £)] 

+  [P(J^  |  AR,  %)P(AR  |  A  #)  +  P(K5 1 IR,  ^)P(ZR  |  D, #)] 

+  wc [P(ra  |  C, %)P(C \D,%)  +  P(VS  |  C, i)P(C  \D,%)\, (7) 

where  Equation  5  is  for  the  first  level.  Equation  6  is  for  the  infrared  camera  on  the  second  level,  Equation  7  is  for  the  visible 
camera  on  the  second  level.  Object  means  the  detected  blob  is  classified  as  an  object  and  W  is  the  weight.  IR  means  the 
infrared  camera  agrees  the  detected  blob  is  an  object.  On  the  other  hand,  I  R  means  the  infrared  camera  disagrees  that.  V  S, 
Size,  AR  (aspect  ratio),  B  (brightness),  and  C  (color)  mean  in  the  similar  ways.  To  simplify  the  model,  we  set  thresholds  for 
size,  aspect  ratio,  brightness,  and  color  for  classification. 

Preliminary  results  show  that  this  model  reduces  precision  somewhat  but  improves  both  recall  and  F-measure. 

Impact:  A  more  general  model  for  multi-band  image  fusion  has  two  advantages.  First,  it  should  help  reduce  our 
dependence  on  optically  aligned  sensors  since  each  hand  is  responsible  for  identifying  its  own  features.  Second,  it  provides  a 
framework  that  should  allow  us  to  integrate  new  modalities. 

Result:  We  developed  improved  methods  for  target  monitoring.  ID  agreement  adopts  Markov  Chain  Monte  Carlo 
(MCMC)  method  by  utilizing  object  kernel  histograms  and  motion  vectors.  We  use  a  threshold  to  detennine  whether  the 
entering  object  is  new  or  not.  The  object  ID  with  highest  score  which  is  bigger  than  threshold  would  be  assigned  to  the 
entering  object. 

In  Figure  6,  each  column  picture  is  from  the  same  camera  and  different  frame.  As  new  object  enters  camera2,  system 
assigns  it  a  new  ID  (004).  After  camera2  check  with  cameral  using  MCMC,  camera2  assigns  the  object  an  accurate  ID  (002), 
the  same  ID  in  cameral. 


Cameral 


Camera2 


Figure2 

Figure  6:  MCMC-based  target  identification. 


Awards 

Wolf  received  the  IEEE  Circuits  and  Systems  Society  Education  Award  in  2006. 

Wolf  was  named  Rhesa  P.  “Ray”  Farmer  Distinguished  Chair  and  Georgia  Research  Alliance  Eminent  Scholar  at  the 
Georgia  Institute  of  Technology.  This  chair  was  established  to  promote  embedded  computing;  the  Georgia  Research  Alliance 
component  promotes  entrepreneurship  and  technology  transfer.  Sample  press  releases  on  the  appointment: 

•  http://www.gatech.  e  du/ ne  ws -r  oomTel  ease.  php?i  d=  1 3  46 

•  http://www.gra.org/uploads/newsarticleAAavne%20Wolf%20Press%20Release%20-%20Final.doc 

Wolf  received  an  honorary  doctorate  from  the  University  of  Patras,  Greece  in  October  2008. 


Publications 

•  S.  Velipasalar,  J.  Schlessman,  C.  -Y.  Chen,  W.  Wolf,  J.  P.  Singh,  “SCCS:  A  Sealable  Clustered  Camera  System  for 
Multiple  Object  Tracking  Communicating  via  Message  Passing  Interface  ’’IEEE  International  Conference  on 
Multimedia  &  Expo,  July  2006. 

•  S.  Velipasalar,  C.  H.  Lin,  J.  Schlessman,  W.  Wolf,  “Design  and  verification  of  communication  protocols  for  peer-to- 
peer  multimedia  systems  ’’IEEE  International  Conference  on  Multimedia  &  Expo,  July  2006. 

•  Cheng- Yao  Chen  and  Wayne  Wolf,  “Background  modeling  and  object  tracking  using  multi-spectral  sensors,”  in 
Proceedings,  ACM  Workshop  on  Video  Surveillance  and  Sensor  Networks,  ACM,  2006. 

•  Jason  Schlessman,  Ikdong  Kim,  Jaechang  Shim,  Yun  Cheol  Baek,  and  Wayne  Wolf,  “Low  power,  low  cost  wireless 
camera  sensor  nodes  for  human  detection,”  in  Proceedings,  Sensys  06,  ACM,  2006. 

•  Cheng- Yao  Chen,  Distributed  Multi-Modal  Human  Activity  Analysis'.  From  Algorithms  to  Systems,  Ph.D.  Dissertation, 
Princeton  University,  August  2007. 

•  Senem  Velipasalar  and  Wayne  H.  Wolf,  “Frame-level  temporal  calibration  of  video  sequences  from  unsynchronized 
cameras,”  Machine  Vision  and  Applications  Journal,  DOI 10. 1007/s00 138-008-0 122-6,  January  24,  2008. 

•  Senem  Veliapasalar,  Jason  Schlessman,  Cheng- Yao  Chen,  Wayne  H.  Wolf,  and  Jaswinder  P.  Singh,  “A  scalable 
clustered  camera  system  for  multiple  object  tracking,”  EURASIP  Journal  on  Image  and  Video  Processing,  v.  2008, 
article  ID  542808,  2008. 

•  Senem  Velipasalar  and  Wayne  Wolf,  “Lessons  from  a  distributed  peer-to-peer  smart  tracker,”  Elektrotechnik  and 
Informationstechnik,  2008,  125/10,  1-7. 

•  Jason  Schlessman,  Mark  Lodato,  Burak  Ozer,  and  Wayne  Wolf,  “Heterogeneous  MPSoC  architectures  for  embedded 
computer  vision,”  in  2007  IEEE  International  Conference  on  Multimedia  and  Expo,  IEEE,  2007,  pp.  1870-1873. 

•  We  have  filed  a  U.  S.  patent  application  (PCT/US2007/071501)  on  the  SCCS  system. 


Collaborations  and  Leveraged  Funding 

This  project  leverages  a  DLTRIP  grant  awarded  during  this  period.  We  have  purchased  several  important  pieces  of 
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New  Jersey  on  distributed  surveillance  systems.  We  received  technical  support  for  their  MWIR  cameras  from  Sensors 
Unlimited/Goodrich  in  Princeton,  New  Jersey.  We  also  held  technical  discussions  with  Noble  Systems,  a  manufacturer  of 
MWIR  sensors.  We  received  programming  environments  from  Texas  Instruments  for  use  with  their  Da  Vinci  processor. 

We  have  leveraged  the  Pi’s  startup  package  at  Georgia  Tech  for  financial  support  of  additional  students  and  equipment. 

Students 
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his  Ph.D.  during  this  reporting  period. 

Two  unsupported  M.S.  students  worked  on  projects  relevant  to  this  program:  Mihir  Wagli  (background  elimination  hardware), 
Palak  Shah  (MPI  measurements). 
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•  Number  of  undergraduates  funded:  0 
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Technology  Transfer 

We  have  discussed  possible  infrared/visible  projects  with  the  Georgia  Logistics  Center,  located  at  the  Port  of  Savannah. 


Future  Plans 

We  plan  to  study  distributed  tracking  algorithms  using  vehicle-based  sensors.  Solving  this  problem  requires  jointly 
estimating  vehicle  and  target  positions.  We  plan  to  develop  new  MCMC  algorithms  to  perform  this  estimation  in  a  distributed 
network  of  vehicle-based  nodes. 


